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We formally characterize a set of causality-based properties of metabolic networks. This set of prop- 
erties aims at making precise several notions on the production of metabolites, which are familiar in 
the biologists' terminology. From a theoretical point of view, biochemical reactions are abstractly 
represented as causal implications and the produced metabolites as causal consequences of the im- 
plication representing the corresponding reaction. The fact that a reactant is produced is represented 
by means of the chain of reactions that have made it exist. Such representation abstracts away from 
quantities, stoichiometric and thermodynamic parameters and constitutes the basis for the charac- 
terization of our properties. Moreover, we propose an effective method for verifying our properties 
based on an abstract model of system dynamics. This consists of a new abstract semantics for the 
system seen as a concurrent network and expressed using the Chemical Ground Form [6] calculus. 
We illustrate an application of this framework to a portion of a real metabolic pathway. 

1 Introduction 

Understanding the relationships amongst the elements of biological interaction networks is a relevant 
problem in Systems Biology. In the words of 11231 . "diagrams of interconnections represent a sort of 
static roadmaps, but what we really seek to know are the traffic patterns, why such patterns emerge, and 
how we can control them". Formal descriptions of interconnections and methodologies for performing 
traffic simulations in silico can orientate in vitro experimentation. 

We focus here on metabolic networks, i.e. the set of the cellular biochemical pathways involved in 
energy management and in the synthesis of structural components. Biochemical pathways are typically 
composed of chains of enzymatically catalyzed chemical reactions and are interconnected in a complex 
way. This makes difficult to understand the overall emerging behaviour of a network, starting from the 
detailed knowledge of the single reactions. 

An interesting issue is the identification of the parts of a network whose integrity is crucial for certain 
functionalities. These "hot points" represent candidate drug targets for repressing undesired metabolic 
functions involved in pathological states, such as infectious diseases and cancer [j9|[I21. Several proper- 
ties characterizing different aspects of the network functionalities have been introduced in the biological 
literature, often with slightly different versions for the same property. What formal methods can offer is 
a way to make precise and classify properties, too often expressed only at an intuitive level. 

Since, broadly speaking, causality plays a key role in finding chains of reactions that connect the 
parts of a network, we base our understanding of properties in terms of causality relations. Following 
the approach in lH, in order to give a formal characterization of causality-based properties, we interpret 
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biochemical reactions as "logical consequences", where the source metabolites can cause, i.e. produce 
the target ones. Furthermore, we adopt the notion of explanation of a certain metabolite. Given a set of 
reactions and initial conditions, an explanation represents the chain of reactions, causally dependent one 
from the another, that leads to the metabolite. Our approach therefore models the biochemical dynamics, 
capturing causal dependencies, while abstracting away from other aspects, like quantities, stoichiometric 
and thermodynamic parameters. 

On top of the causality notion, we formalize several properties from a potentially longer list. Beyond 
the relevance of their biological meaning, these properties show how the few and simple ingredients we 
propose are sufficiently expressive to make precise several common notions, often intuitively used in 
biology. Specifically, the set of properties we present concerns the role and the relations of metabolites 
and reactions within a metabolic network. 

We propose an effective method for verifying the formalized causal properties, based on the con- 
struction once for all of an abstract representation of the dynamics of the biological system. The system 
is specified as a concurrent network in terms of the Chemical Ground Form calculus [7 ]. We opted for 
the CGC for its extreme simplicity and well established theories and techniques, while it is, at the same 
time, sufficiently concrete to capture our properties. For our verification purposes, we have defined a 
slightly different semantics from the one in dEOJl . It is worth pointing out that our choice mainly strives 
for simplicity. Other specification languages suitable for biological networks, e.g. |[29l l34l l33l l8l l2TTl. 
could have been adopted as well, some perhaps even more expressive, but generally requiring higher 
costs for model construction and verification procedures. 

Overall, we are interested in efficiently evaluating the impact of changes on working hypotheses, 
such as the variation of the initial conditions and of the sets of reactions, according to a what-if strategy. 
The method we propose is meant to be exploited as a sort of preliminary in silico screening, aiming 
at determining the most promising experiments to be carried out in vitro. Finally, we believe that our 
framework should be palatable to biologists, since it is very close to the biochemical intuition of causality 
and to the spirit of many informal notions currently in use. 

Related Work. Due to recent progress of wet-lab techniques, many metabolic networks are struc- 
turally well characterized and can be reconstructed for many organisms up to the genome-scale level 
(see e.g. ||30l ). However, approaches grounded on dynamical modeling, e.g. Metabolic Control Analysis 
or Metabolic Flux Analysis (see [161]), may encounter difficulties, mainly because part of the needed ki- 
netic parameters are not known. In contrast, structure oriented analysis only requires information about 
the topology of the investigated networks, which is often known. Even though this kind of approach may 
not provide a detailed knowledge of the dynamics underlying the target phenomenon, it allows key prop- 
erties of metabolic networks to be addressed, as demonstrated by the plethora of works in the literature. 
We mention here PTI and [40], where the authors propose to exploit "elementary modes" or "extreme 
pathways" to perform pathway analysis and to assess structural properties, such as structural robustness 
and redundancy. In B2l . a method is presented that relies on the network structure for predicting ro- 
bustness in gene regulation networks, while reviews a group of works in which recurring patterns of 
interaction in biochemical networks (a.k.a. network motifs) are identified and related to specific behav- 
iors or network robustness. In ||26l a novel method is used to target those nodes whose deletion causes 
the failure of certain network functionalities. 

Process algebras have been often used to abstractly model biological systems as concurrent systems, 
e.g. llT7l,[39ll38l[T2l[T3l l6l l36l . Closer to our approach is the work presented in iTTTTl . where the authors 
apply a causal semantics of the 7r-calculus lf27l in order to describe biochemical processes. We use 
instead CGF, with a simpler semantics, but suitable for establishing the causal dependencies of interest. 
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Our results are close to those obtained by applying Control Flow Analysis (CFA), a quite efficient 
static technique, to process calculi used for modeling biological systems, e.g. Il28ll29ll34ll33l l4l. In all the 
cases, an over- approximation of the behaviour of a system is offered. In particular, the analyses presented 
in Il34ll33l capture causality information relevant for interpreting biological phenomena, and the authors 
propose a formalization of properties, like in our approach. Temporal and causal properties are also 
addressed in EH, where an Abstract Interpretation Analysis for systems specified in the BioAmbients 
calculus is used to model the quantities of molecules involved in interactions. 

Our approach also shares some similarities with BIOCHAM [8] and the Pathway Logic 031 • 
BIOCHAM is based on the Biochemical Abstract Machine, which offers a formal modeling environ- 
ment for biochemical processes and qualitative descriptions of these processes. BIOCHAM is based on 
a rule-centered language for specifying biochemical systems and, differently from our approach, it pro- 
vides tools for querying temporal properties expressed in the Computation Tree Logic. Pathway Logic 
uses rewriting logic for modeling biological pathways and for enabling the symbolic analysis on them. 
In a way similar to ours, biochemical reactions are rendered in terms of rules acting on molecules. Both 
these approaches allow biochemical networks to be specified at a high level of abstraction. However, 
some expressible features, e.g. the distinction among different classes of molecules or reactions, have 
appeared too detailed for the aim of tracking causality and for our quest for a skeletal language for 
characterizing causality -based relevant properties. 

As discussed, several of the mentioned approaches may provide more detailed models and proper- 
ties than ours, however they generally require computationally expensive verification techniques. Our 
proposal combines the formalization of properties with a light-weight, approximate in some regards, 
computational machinery. 

Synopsis. The metabolic network model is illustrated in §2, properties are formalized in §3 and the 
process-algebraic computational framework is introduced in §4. An example is discussed in §5. 



2 A formal model of metabolic networks 

We give an abstract representation of metabolic networks and of the corresponding biochemical reac- 
tions. More precisely, we abstract away from quantities, stoichiometric proportions, kinetic or thermo- 
dynamic parameters, that are involved in reactions, e.g. consider a standard biochemical reaction like: 

aA + bB -> r cC + dD (1) 

where A,B,C and D are the species involved, a,b,c and d are the corresponding stoichiometric coeffi- 
cients, and r represents the rate at which reactants become products. We abstractly represent Q as: 

AoB -> CoD (2) 

We focus on the fact that the presence of both A and B represents the possibility for C and D to be 
produced or caused. Furthermore, we abstract from the dynamic evolution of the network, implicitly 
assuming that reactants are never consumed, that it is also also an abstraction over their quantities. As 
a consequence ([2]) reads as AoB— > Ao B oC o D. Our model gives therefore an over-approximation of 
the set of the actual pathways, possibly including some pathways that could be actually prevented, for 
instance, by the lack of a suitable quantity of reactants or by an inadequate temperature. 

For easing the computational machinery, we further decompose any rule causing more than one 
metabolite into a set of rules with only one caused metabolite each, e.g. rule ^ becomes A o B — > C (3) 
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plus A o B — > D (4) . This transformation does not impact on causality: the set of metabolites producible 
by the original rule can be still produced by applying the new rules, as the premises are the same. 

Finally, following [20], the unlikelihood of reactions involving more than two species, leads us to 
address only reactions with two causing metabolites at most. By summarizing, in the reactions we 
consider, either two molecules produce a new molecule, or a molecules degrades to another one. 

Definition 2.1 (Rules) Given a finite set of metabolites M, ranged over by over by A, At, B, C, D..., a 
rule is either in the form (1) AyoAj — > C, or (2) A — > C 

The description of causal relations within a metabolic network can be obtained by defining a set of 
reaction rules R that describe how new metabolites can be produced, and a set S of metabolites, initially 
present in the network solution, which can be seen as premise-less rules. 

Definition 2.2 (m network and Initial Solution) An rmnetwork/? is a finite set of rules with non-empty 
premises. An initial solution S is a finite set of premise-less rules in the form — > A. 

The fact that a metabolite is caused by a network is made precise by means of the following definition 
relating the metabolite to a chain of reactions that produce it. 



Definition 2.3 (Explanation) <£s : #(C) is an explanation for C G M with respect to S and R if either 

• C G S and Ss^r(C) = C[ ], or 

• Ay oA 2 -» C = r G R, M SjR (M),£sA A 2) and S SjR {C) = C r [^, R (A 1 ),^ R (A 2 )], or 

• A^C = r G R, 3^(A) and <%,*(C) = C r [<%(A)]. 



Note that a metabolite can be initially present in the solution or be produced anew from the network, 
or both. These cases can be distinguished by the structure of the relative explanations. For simplicity, 
hereafter in the following definitions we only report the case of rules in the form A 1 0A2 — > C, by leaving 
out the simpler case of rules is in the form A — >• C, where ^^(C) = C r [<=>sj?(A)]. For observing the 
explanation structure, we resort to the following auxiliary definition. 

Definition 2.4 Given an explanation <£s„r(C), 

• the set of metabolites required for C, written ^{Ss.r[C)\ is defined as follows: 



Of course, given S and R, there might be more explanation for the same metabolite C that corresponds 
to different ways to produce it. In turn, the explanation of another metabolite D that uses the metabolite C 
more than once, could include different explanations for C at different points. For the sake of simplicity, 
we assume to use only one explanation for each metabolite inside another explanation. For this reason, 
we introduce the notion of a uniform explanation. 

Definition 2.5 (Uniform Explanation) An explanation £s,r{C) for C G M w.r.t. S and R is a uniform 
explanation (written S^ R (C)) if it is an explanation for C G M and VD G ^i(S^ R (C)\ if Ss^iP) and 

<d' sr (D) occur in SV R (C), then <§s,r (D) = <§' s R (D), i.e. S^ R (C) does not contain two different explanation 




• the set of reactions required for C, written M((t>s,r{C))> I s defined as follows: 




for the same metabolite D. 
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Since we are going to observe the whole set of explanations for each metabolite, in order to charac- 
terize our properties, we do not loose generality, by restricting ourselves to uniform explanations. From 
now on, we will then use only uniform explanations and therefore we will omit the superscript U . The 
following result relates general and uniform explanations. 

Theorem 2.6 Given S andR, we have that 3<% iR (C) iff3S^ R (C) for all C G M. 

3 Causality-based properties 

Several properties regarding metabolic networks, which are widely accepted at an informal level, can be 
made precise within our framework. Distinguishably, reasoning in terms of explanations adds an extra 
level of detail to the definition of the properties of interest, as well as having an explicit characterization 
of the network environment allows us to take into consideration the different conditions under which a 
network may work. We present properties that can be interpreted in terms of our notions of causality 
and explanations and that, given the abstraction of our model, are qualitative properties. We group them 
in properties about reactions and about networks. The first ones allow us to interpret the results of 
perturbative experiments, due to variations of the initial solution S or of the rules in R, while network 
properties have to do with robustness. 

Reaction properties Often, the rules defining the reactions of a metabolic network correspond to en- 
zymes that catalyze such reactions or to genes that code for such enzymes or for the proteins involved 
in reactions. Rules are hence the main object when studying a network behavior and it is quite natu- 
ral trying to characterize their role in the production of metabolites. The next definition states when a 
rule has to be considered essential for the production of a given metabolite. A rule is essential if it is 
not dispensable, i.e. the network, deprived of it (e.g. by knocking-out the corresponding gene), is not 
able to produce the metabolite. Generally, in the biological literature, the notion of essentiality has been 
expressed informally and often referred to the elusive notion of viability of an organism, e.g. lfT9l . 

Definition 3.1 (Essentiality) Given R, a rule r £ R is essential in S for the metabolite CiffB $s,r{C) 
and £ S jt\r(C)- 

Note that if a rule r is essential in S for the metabolite C, then all the explanations of C use r. From a 
biological point of view, it can be significant to distinguish amongst two degrees of essentiality. In the 
first case, essential rules correspond to those reactions whose essentiality holds only in a given solution 
S. Characterizing these "hot points" in a biochemical network operating in a given solution, can be 
useful when the studied networks are typically resident in a well defined environment. This is the case, 
e.g. , of drug development for cancer therapy, as malignant cells typically live in human blood or inter- 
cellular matrix. Essential rules in the metabolic network of malignant cells represent potential targets 
for anti-cancer drugs designed for disrupting that network. Since cancerous cells always act in a unique 
environment, it is important to identify their "weak points" always considering an initial solution S 
resembling the composition of human blood or inter-cellular matrix. In contrast, when the target system 
is an organism capable of living in various environments (such as a bacteria), identifying a stronger kind 
of essentiality, where a rule is essential for all possible solutions S, turns out to be a better choice in order 
to find "universal" targets for inhibiting the production of a given metabolite. Note that verifying this 
second kind of essentiality for a certain metabolite C is straightforward, because it simple amounts to 
verifying whether there is only one rule (not having C in the premise) for producing C. 
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Also relationships between rules have been traditionally explored, as has been done with the notion 
of mutual essentiality, e.g. ll46l . We say that two rules are mutually essential for C, when their individual 
exclusion does not prevent the production of C, i.e. neither of the two rules is essential, but their simul- 
taneous exclusion does. Detecting mutually essential reactions can be useful, again, in drug research for 
identifying multiple targets for drugs against parts of a network that represent functional alternatives for 
the production of a given metabolite. 

Definition 3.2 (Mutual essentiality) Given R, the rules rl,r2 G R are mutually essential in S for the 

metabolite C iff3 f^s,R\ri (C) an d 3 <ffs,R\ri(C), while £'s,R\{ri,r2} (C). 

Moreover, we establish that two explanations for a metabolite C are vicarious when they use different 
sets of rules, thus representing two different ways of producing C. 

Definition 3.3 (Vicariate) Given R, and S, and a metabolite C, an explanation S$ r(C) is vicarious of 

4 >R {C) iff@{£ s ,R{C)) * &(4,r(C))- 

This property is related to the previous one, e.g. if two rules r\ and r2 are mutually essential for C, 
then <S's,R\rl(C) is vicarious of ^s,R\ri{C). 

Furthermore, we investigate the order in which different metabolites are produced, and in particular 
we determine whether the production of a metabolite is a necessary condition (i.e. it is a checkpoint) for 
the production of another one. 

Definition 3.4 (Checkpoint) Given R and an initial solution S, B is necessary for C iff for all explana- 
tions ofC £ S fl(C), B G M{Ssfi{C)). 

Identifying checkpoints offers some insights on the structure of metabolic networks. From a topological 
point of view, checkpoint elements can be related to "bottlenecks" in molecular interaction networks 
Il47l . As shown in [47] these elements, due to their strategical position in the network, are candidate for 
being essential as well as the reactions through which they are produced. 

Similarly to the previous property, one can be interested in the order between rules and whether the 
application of some rules of 7? it is a necessary condition for the application of other rules. 

Definition 3.5 (Causality) Given R including r\ and r%, and an initial solution S, let the metabolite 
C be the conclusion of rule r2 G R. The rule r\ causes r-i (r\ C r^) iff for all explanations $sji(C) = 
C r2 [Ss,R{Ax),S s , R {A2)}, either n G ^(<%, R (Ai)) or n G &{g s #{A 2 )). 

Note that if r\ C r2 then the metabolite produced by rule r\, say A, is necessary for the metabolite 
produced by rule r%, say B, while if A is necessary for B it can be the case that r\ %,r%. Also this 
property can be exploited (eventually synergically with the checkpoint property) to gain topological 
insights concerning the investigated network. For instance if a rule r causes a group of other rules it is 
possible to say that r acts as a bottleneck. 

The next property is useful to reason on which metabolites can be omitted from the initial solution, 
without compromising the initial capability of the system to produce metabolites in many different ways. 
Roughly speaking a metabolite can be omitted from the initial solution because it not necessary in the 
production of a given C or it is necessary but the system is already able to produce it. 

Identifying these metabolites can aid in metabolic engineering P31 . e.g. when for optimizing re- 
sources usage is requested to characterize the minimal environment needed for a bioreactor. Note that 
the so-called conditional mutants differ from the wild type (i.e. the microorganism possessing the genome 
commonly found in nature) only for the minimal environment needed for their viability. The genome of 
conditional mutants do not code for an enzyme essential for its life and their survival is conditioned by 
the presence in S of the metabolite produced by the missing reaction. 
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Definition 3.6 (Redundancy) Given R and a metabolite C, an initial solution S is redundant for C S 

iff there exists at least a metabolite B € S s.t. for all $s,r(C) for C, there exists <?s\{b}r{C) such that 
^sAC))Q^s\{b},r(C)). 

According to this definition, a given initial solution S can be redundant for a metabolite C in the 
wild-type but not redundant for the same metabolite in the conditional mutant. Of course, the previous 
property can be weakened obtaining another property that checks whether R is still able to produce C 
after the exclusion of the reagent B from the initial solution. In other words, it addresses the impact of 
the exclusion of some metabolites from the initial solution, offering straightforward applications to the 
resource optimization problem described above. 

Definition 3.7 (Exclusion) Given R, a metabolite B € S cannot be excluded for the production of 
metabolite C S iff <?s\{b}.r(C)- 

Note that B cannot be excluded for the production of C if and only if for all explanations $s,r{C), B[] 
does occur in $s,r{C). The previous property is in some way related to the checkpoint property expressed 
before: if B belongs to S and is not necessary for the production of C, then we can harmlessly exclude it 
from the initial solution. However, in general, the two properties do not coincide (see Ex. |4.13[). 



Metabolic network properties Informally, robustness can be defined as the capability of a whole net- 
work of resisting to damages. In the biological literature there is not a common agreement on what 
"robustness" exactly means ll25l . One of the most used definition says: "robustness is a property that 
allows a system to maintain its functions against internal and external perturbations." ( CD ). A similar 
definition [44] is also widely used: "robustness, the ability to maintain performance in the face of pertur- 
bations and uncertainty, is a long-recognized key property of living systems". Both definitions, however, 
result to be not well assessed and therefore open to different possible interpretations. Moreover, the no- 
tion of robustness is related to the maintenance of a "function" or of "performance". Both these concepts 
subsume quantitative issues and their exact meaning change depending on the work considered. The 
uncertainty in definitions makes difficult both evaluating robustness effectively and comparing different 
networks addressing this property. A more reliable way to assess this notion consists in considering the 
qualitative features of the network in hand rather than its quantitative throughput HI EH- The notion of 
network robustness can be linked to the overall error tolerance, seen as the capability of carrying infor- 
mation in spite of local failures which, in turn, depend critically on the topology of network wiring |T). 
In our framework this corresponds to evaluate the resistance to failures in terms of the maintenance of 
the capability of producing a given metabolite. This paradigm-shift allows us to propose the following 
formal notions of robustness. 

Definition 3.8 (Strong Robustness) Given two mjnetworks R\ and R2, R2 is strongly more robust than 
Ri in Sfor C, written R\ <Cs,c ^2 iff 

3£ s , Rl (C) 3^ SiS2 (C) A <%{£ Sfil {C))=m{g Sfi2 {C)). 

This is quite a strong requirement, accounting to say that all the rules used for producing P in R\ , are 
present in R2 and can be used as well. Of course, Rj may also allow more explanations, using different 
rules. This consideration leads us to formulate a weaker property, by requiring that R2 is able to produce 
the same metabolite, without constraints on the rules to be used. 

Definition 3.9 (Weak Robustness) Given two mjnetworks R\ andR2, R% is weakly more robust than R\ 

in Sfor C, written R\ -<s,c ^2 iff 

3<%*,(C) => 3^r 2 (C) 
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R ::=0\A = M,R Reagents Environment (empty, or a reagent and Reagents Env) 

M ::= | n^.S+M Molecule (empty, or an interaction and Molecule) 

S ::=0\A\S Solution (empty, or a variable and Solution) 

71 ::=a\a\x Basic Action (input, output, delay) 

CGF ::=(R,S) Chemical Ground Form (reagent environment with initial Solution) 



Table 1 : Syntax of simplified CGF 

4 Verification Methodology 

Our methodology is based on the construction of an abstract model of the biological system. The model is 
obtained by a new abstract semantics of the system, interpreted as a concurrency network and expressed 
using the Chemical Ground Form (CGF) |7] calculus. We exploit the notion of path for verifying our 
properties. The CGF is a fragment of the stochastic Ti-calculus ||35l[32l . Since we abstract away from 
quantities, we resort to a simplified version of CGF, in which stochastic features, like action rates, are 
discarded. In particular, we abstract away from the version of CGF, presented in |[T0ll2"2l . because we 
represent solutions as sets of reagents rather than multisets. The syntax of CGF is defined in Tab. [T] We 
consider a set of Names (ranged over by a,b,c, . . .), a set of labels S£ (ranged over by X,jJ. ...), and a 
set Mol (ranged over by A,B,....) of variables (the reagents). A CGF specification is composed by a 
(finite) list of reagent definitions A; = M/, where A/ is a variable that stands for the name of a chemical 
species and M; is a molecule that describes the interaction capabilities of the corresponding species. The 
environment R defines the reagents of a solution S. A molecule M may do nothing, or may change after 
a delay (e.g. because of a molecular decay) or may interact with other reagents. A standard notation is 
adopted: X represents a delay; a and a model interaction over a shared channel a (the input and output, 
respectively). Together with the reagents definition, a CGF includes a solution 5, that represents the 
initial conditions and is described by a parallel composition of variables, i.e. a finite list of reagents. This 



maps onto the initial solution from Def. 2.2 



In order to distinguish the actions that participate to a move, we label them. In a CGF (R,S), R is 
well-labeled, if basic action labels are all distinct. We assume to have well-defined reagents environment 
and, given R, to have a definition for each variable A in R or in 5. Moreover, given a label A G JSf , we 
use the notation RA.X to indicate the process %^ .S, provided that A = . . . + TV .S+ . . . is the definition 
of A occurring in R. Finally, given a CGF (R,S), we denote with T the set of all molecules occurring in 
either S or in the rules of R. In the following, we will use Sol for the domain of sets of reagents. There 
is one transition rule for delay actions and one for synchronizations. Transition are of the form 

S ^4 5' with S,S',§ G Sol, X G Mol, & G <£ = ££ U (J2f x Sf) 

• reports the label(s) of the basic action(s), which participate to the move, 

• S C So reports the subset of the reagents of the initial solution directly involved in the current move 

• X reports the unique reagent produced by the move (i.e. by the corresponding reaction). 

The Rule (Delay) models the move of a process X .Q appearing in the definition of a reagent A. The 
transition records the label A, the singleton A if A belongs to the initial solution So and the reagent 
produced by the reaction. The Rule (Sync) models the synchronization between two processes a^ .Q\ 
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tl,t2= (6^,{B}A 



tl,t2,t3 



tl,t2,t3,t4,t5 



SO 



tl=(X.|iUA.B>.D 




Figure 1: LTS Graph of Ex. 4.1 where S = {A,B}, Si = S U {D}, S 2 = Si U {C}, S 3 = S 2 U {£} 



and .0 occurring in the definition of A and B, resp. The transition records the label pair (A , jU ) , together 
with the part of Sq used in the rule (i.e. {A , B} n So) an d the possibly new reagent produced by the reaction. 

(Lieiayj A.{A}ns ,e (»y n c) (A. / i).{A.B}ns„,e 

S >SU{Q} S >SU{Q} 

We denote with Tr((R,So)) = (J^,—>-,So,R) the labeled transition system (LTS), obtained, starting from 
the initial state So € w.r.t. to environment R, and with Gr((R,So)) the corresponding graph. Since 
environments are well-labeled, different transitions leaving from the same state carry distinct labels. 

For simplicity, we identify each reaction by a label G Jzf . In our model, a reaction r : AoB — ^ D 
can be identified by (A,ju) and rendered by the following reagent definitions: 

A = a x .D 

B = a».0 

Given an initial solution So = {A,B}, the system may perform the transition {A,B} ^'^'^ A ' B ^' D y 
{A,B,D}, since S = {A,B} n {A,B} and S' = So U {D}. If A and B are involved in other reactions, other 
actions can be added in their specifications, as in the example below, where we illustrate our approach 
on a toy reaction network. 

Example 4.1 Consider the initial solution So = {A,B} and an m-network R, consisting of the rules 
reported below, on the left-hand side. 

ft! 1 } AoB D A=a l. D + b\o 

M i°Z -+ E B = a».0 + dP.A 



(j3, 7 ) DoB — > A " /rx-vnx/n 

e D C = C 

(y,v) DoC -> £ 



77ie corresponding CGF specification is above on the right, while the corresponding graph is in Fig^lj 
For simplicity, in the presence of multiple self-loops, we collapse the self-loop arcs in a single one. 

• Starting from So = {A,B}, the only possible transition (here called t\) is the one that uses rule (A, jll) 
and leads to the state Si containing D. After this, 

• either we can fire transition t^ ( rule ), that leads to S 2 , that includes C; 

• or, we can fire transition t 2 ( rule (fi,Y)) that leads to Si, where A is already present. 

• From S 2 , both transitions U ( rule (8,r\)) and t$ ( rule (Y,v)) are possible, lead to S3 and produce E. 

• Intuitively, we can observe that some transitions, cause some others: ti causes t 2 , tj, t\ and ts, ?3 causes 
both £4 and t$, while t% and t$ are independent from each other. 
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• We have two paths reaching a state that includes E: So A- Si % S2 — V S3 that corresponds to the 
explanation E(g ^ [A[],Q [D^ ^ [A[],B[]]]], ana I So A- Si -V S2 -V S3, that corresponds to the explanation 

% )V )[% ) M)[AD^O].Q[^(Aj.)[AD,flD]]- 

• Establishing which metabolites in So are used in each transition, can be useful to investigate their 
impact on the production of the other metabolites. For instance, A is necessary for the production ofC 
and E, because both the states including C and E are reached, using t\ ( rule (X,/A)), that requires A. 

In the following, we are going to make precise the notions only informally introduced in Ex. |4.1| Note 
that self-loops can correspond either to the application of a rule already applied or to the application of 
a rule, that has not already been applied, but that produces a metabolite already present. Self-loops of 
the first kind do not add any useful information from a causality point of view, in terms of the properties 
introduced in the previous sections. Self-loops of the second kind can be instead useful, especially if 
we are interested in checking the possibility of the system to produce a certain metabolite even if it is 
already present in the initial solution. We first focus on the computation paths not including self-loops 
at all, that we call causally relevant paths. This notion is used to verify many properties of the previous 
section. 

Definition 4.2 (^-path) A path p in Gr((R,So)) is a causally relevant path (%-path) if 

©0,So,Xo „ ©i, _ _ ® m _i,5 m _i,X m _i 10/0 r ;;-^ri 1 

p = So > Si > S2-.S m _i > S m and S,- f= S,-_i for all i G [l,m\ 

We say that p leads to C ifC = X m _\ (i.e. ifS m is the first state including C). 
Theorem 4.3 (Correspondence) Given a %-path p in Gr((R,So)) that leads to C 

©o,So.Xo 0i,Si,Xi © m _i,S m _i,X m _i 
P — oo > ^1 > ^2----Jm-l > Jni 

let tr p () be the function, which for a given path p and reagent B is defined as follows: 

{B & [tr p (Ai),tr p (A 2 )] if 3 i £ [l,m]. X,_i = B, and & : A\ 0A2 —> B 
is the rule applied in the transition ti, 
B[) if Be S . 

We obtain an explanation <£s 0v r(C) for C, as tr p (C), which uses the same rules of p. 

Moreover, given an explanation S'sq,r(P) we can construct a set of corresponding paths, starting from 
the subset of the initial solution used in the explanation, i.e. all the metabolites A such that A[] occurs in 
$So,r{P)- We then proceed by exploring the explanation structure from innermost outermost and therefore 
firing the transitions corresponding to the rules used in the explanation. Note that, serializing the possible 
parallelism of the explanation can give rise to a set of paths rather than to a unique path. For instance, 
if we start from an explanation C©, [A0 2 [B[],D[]],F0 3 [£[],G[]]], corresponding to the application of rules 
©1 = AoF — > C, &2 = B o D — > A and ©3 = E o G — > F, then we have two corresponding paths, where 
the order in which the transitions occur is different. Note that paths obtained by £s ,r{P) are ^-paths. 
Definition 4.4 (p-path) A path p in Gr((R,So)) is a relevant path (p-path) if 

&o,Sq,Xq ®i,Si,X\ & m -i,S m -i,X m _i . , 

p = So > Si > S2-..S m _i > S m andvj £ [l,m\ Sj = Sj-i => 

(i) Xj G So (the produced metabolite was already in So), 

(ii) {Xj} n (\Jo<j < jXi) = ( the produced metabolite was never produced before), 
(Hi) {Xj} n (Uo<i'</ S,) = ( the produced metabolite was never required before). 

We say that p leads to C if C = X m _\. 
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Intuitively, conditions (i)-(iii) will aid us to determine which metabolites could harmlessly excluded from 
the initial solution, identifying the metabolites that the system itself is able to produce before they are 
required. Note that p-paths as well as #-paths are always finite: by definition, a self-loop transition can 
be included in a p-path at most once. In particular, each ^-path is also a p-path. 

We are now ready to characterize all the properties introduced in §3, in terms of %- and p-paths. A 
rule & is essential for the metabolite C if every #-pafh leading to C includes 0, while the rules &i and 
©2 are mutually essential for C if every #-path leading to C includes at least one of the two rules. 

Theorem 4.5 (Essentiality and Mutual Essentiality) • A rule ©, is essential in So for a reagent 

C So ffiV %-path Sq @0: ^ 0:X °) S\ Q '"^'' X '> S2.... Q '"~''^"'~'' X "'~'> S m in Gr((R,So)), leading to C, 
there exists at least an i £ [0, m — 1] : @ ( = 0. 

• Two rules &[ and &2 are mutually essential in So for a reagent C So iff 

— neither &i nor ©2 are essential in So for C, and 

- V %-path So &0,So,x °) s 1 ® uSl ' Xl > s 2 .... @m l,Sm ~ l ' Xm '> s m in Gr({R,So)), leading to C, there 
exists at least an i £ [0,m — 1] s.t. ©; = &[ or 0, = ©2- 

In this context, two # -paths leading to C represent vicarious explanations if the # -paths resort to 
different sets of rules. 

Theorem 4.6 (Vicariate) Given two %-paths p\ and p 2 in Gr((R,So)), leading to C 

pi = S > S[ > S$.... > S l h 

©jslxj ®lsj.xl 2 ^ 2 _ 1 ,^_ 1 ,^- 1 

Pi — >->0 > ^1 r 2 --.. > ^ k 

p\ is vicarious of p 2 iff either h^k or there exists at least a j s.t. @j Uo</<ft{®f }• 

To prove checkpoint properties, we exploit the information recorded in S, in order to check whether 
a certain metabolite B is necessary in the production of a reagent C. 

Theorem 4.7 (Checkpoint) Given R and an initial solution So, reagent B is necessary for the production 

ofC if for all %-path So q °' 5o ' Xq > S\ &1,Sl,Xl y „„s m -\ Qm s m in Gr((R,So)), leading to C, then 

(i) B G S m _ 1 and ( ii) ifBG So, then B 6 (So U . . . U S m _ 1 ). 

Conditions (i) and (ii) amount to saying that there is a rule 0, that has B in its premises. 

Example 4.8 Consider the initial solution So = {A,B, D, 0} and an m-network R, consisting of the rules 
reported below on the left, while the corresponding CGF specification is on the right. 

A=a x .C + cV.O 
B = a^.0 + d^.H 
C = b s .P + g°.0 

D = .F + / .0 + e*.E + g l .0 
E =h n .L 
F = b\0 
H = e v .O + h n .O 
L = f°.H 

= f p .O + f.O 
P = l a .E 





AoB 


-> C 


(5,tj) 


CoF 


-+ P 


(P,Y) 


Do A 


->• F 




BoD 


H 


(V.v) 


DoH 


-+ E 


(«7,P) 


LoO 


H 


(<M) 


EoH 


->• L 


(0,1) 


CoD 


-> 


(«,C) 


PoO 


-> E 
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Figure 2: LTS Graph of Ex. 4.8 



Figure^jdepicts the LTS semantics where So = {A,B,D, 0}, S20 = T, and 

5i=5 U{C} 5 2 = 5 U{F} S 3 =S U{H} 5 4 = 5iU{F} 5 5 = 5 3 U{C} S 6 = S 3 U{£} 

S 7 = SiU{F} S & =S 4 U{P} S 9 = S 4 U{H} S 10 =S 5 U{E S„ =S 6 U{F} 5 12 =5 6 U{L} 

5 B =5 8 U{£} 5 I4 = 5 8 U{//} S 15 =S 9 U{E} S 16 =S 10 U{L] S 17 =S„U{L} S 18 =S I3 U{#} 
5 19 =5 16 U{F} 

We can observe the following properties. 

• The production ofH is necessary for that ofL, indeed H n Sq = awe? a// the states containing L come 
after states that include H. 

• Rule (%,d) is essential for the production ofH. Actually, also rule (<J,p) is able to produce H, but it 
requires the presence ofL that in turn requires H to be produced, as discussed before. 

• Neither rule (V, V), nor rule (<x,£) is essential for the production of E, because there exists 
a %-path p (p') leading to E, which does not use rule {y,v) ((cc, £), resp.): p = So % S2 — > 
S4 -A Sg % 5i3, p' = So S3 A- S7. Nevertheless, rules (ijf, v) a«J (a, £) are mutually essen- 
tial. As expected, p and p' represent therefore two alternative ways of producing E. The corre- 
sponding explanations: £( ce ,Q[P(5^)[C (A;Al) [A[], J B[]],F (jS;7) [D[],A[]]],0[C (A;A() [A[],B[]],D[]]]] (for p) and 
i?(y iV )[D e)[B[],D[]]] (/or p'j represent two vicarious explanations for E. 

• Note f/ja? /fwe exclude A from the initial solution, we cannot produce F, because rule (j6 , y) could not 
be applied, but we still have a way to produce E. 

We now characterize the causality and the robustness properties in our computational framework. 
We say that a rule causes another rule, whenever the second one is always preceded by the first one. 

Definition 4.9 (Causality) Let 0,0' two rules used in Gr((R,So)). The rule causes 0' (0 C 0') in 

Gr((R,So)) iff for all %-path in Gr((R,So)) p = So e °' 5o ' X °> s 1 @1 ' 1,Xl ) Sz-.-- &m ~ l,Sm ~ uXm - 1 ) s m , 
(©' = &j) 30,- = with i<j. 



128 



A Taxonomy of Causality-Based Biological Properties 



Robustness has to do with the capacity of a network to produce a certain metabolite. 
Theorem 4.10 (Strong and Weak Robustness) Given two environments Ri and R2, 

• Ri "Cs,p Rifor C, iff for all %-path p G Gr((R\ , So)) leading toC, p G Gr((/?2, So)) and leads to C. 

• Ri ~^S,P Rif or C iff for all %-path p G Gr((R\,So)) leading to C, then there exists a %-path p' G 
Gr((/?2,5o)) leading to C. 

Finally, we characterize the redundancy and the exclusion properties. Both are related with the role 
of initial metabolites and the possibilities of the network to produce metabolites, in case of modifications 
of the initial set. To this aim we resort to p -paths and to the following notion, that given a p-path 
p, computes the subset of metabolites & (p) of the initial solution strictly required to perform each 
transition of the given path. Such information is obtained by collecting all the subsets 5/ (i.e. the subsets 
of the initial solution So used by the transitions in p), and by removing those metabolites (in So) that the 
system itself is able to produce along the path. Recall that since p is a p-path, we are guaranteed that the 
transitions that produce these metabolites always come before the transitions that use them. 

Definition 4.11 Given a p-path in Gr((R,S Q )), p = S Q > Si ) S 2 . •■S>m—\ ^ S m 

%{P) = (Uo<(<m^AUo<(<m{^'}) 

An initial solution is redundant for the production of a metabolite C, whenever there exists at least 
a component that is not required from the very beginning, in all the paths that lead to C. Moreover, to 
produce a metabolite C, we can exclude a metabolite B from the initial solution So, if B is not required 
from the very beginning, in at least one path that leads to C. 

Theorem 4.12 (Redundancy and Exclusion) Given an environment R, an initial solution So, and a 
metabolite C g" So, let &c = {p \ P is p-path in Gr((R,So)) that leads to C}. Then 

• So is redundant for C iff\J pe ^ c °M {p) C So. 

• a metabolite B can be excluded for the production ofC iff '3 a p-path p in &c s.t. B ^ffl (p). 



Example 4.13 Consider again the network described in Ex. 4.8 



• The initial solution So = {A,B,D, 0} is redundant for the production ofE. Consider indeed the p-path 
leading to E: p\ = So A- Si % Si A- S4 -A Sg -V S13. Note that p\ is similar to the %-path p, seen in 



Ex. 4.8 except that it also includes the self -loop transition tj (rule (0,1)) on the state Si. This transition 
corresponds to a reaction that produces O, which is already in So, but that it is not required until this 
point. Therefore O could safely be excluded from So, since O g" ^(pi). Similarly, we can prove that 
O tfL'ty (p) for all the other paths that reach S13 and all its successors. 

• Consider again the %-path p' = So — V S3 -A- S7, leading to E. Note that p' is also a p-path and that 
O g" ^ (//). The same result holds for all the paths reaching S13 and therefore the successor states. 



Hence, by Theorem 4.12 we can conclude that the metabolite O could safely be excluded/rora the initial 
solution without compromising the production of the metabolite E. If we are not interested in maintaining 
all the ways to produce E, but just the general ability of the system to produce it, we can exclude A, since 
p' is a p-path leading to E and A °U (p'). 

• Note that in this case, checkpoint and exclusion properties rely on the same information: we could have 
detected indeed that A could be excluded/rora the fact that A was not necessary for the production ofE. 
However, this is not true in general. Assume, e.g., to modify rule (ly, v) in order to require the presence 
of O, as (1//", v)' : O + H — )• £\ As a consequence, also the paths leading to state S7 require the presence 
of O, making also O necessary for the production of E. However, we can conclude that while Sq is not 
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(1) 


fi-D-GlucoseoATP 


-> j5-D-Glucose-6PoADP 


(2) 


p-D-Glucose-6P 


-> fi-D-Fructose-6P 


(3) 


J3 -D-Fructose-6P oATP 


-> fi-D-Fructose-l,6bPoADP 


(4) 


fi-D-Fructose-l,6bP 


— > Glyceraldehyde-3-P o Dihydroxyacet one phos phate 


(5) 


Glycemldehyde-3-P 


—¥ Dihydroxyacetonephosphate 


(6) 


Dihydroxyacet one phosphate 


— > Glyceraldehyde-3-P 


(7) 


Glyceraldehyde-3-P o NAD 


— > 1,3 Bisphosphoglycerate o NADH 


(8) 


1,3 Bisphosphoglycerate oADP 


— > 3-Phosphoglycerate o ATP 


(9) 


3 -Phosphogly cerate 


— > 2-Phosphoglycerate 


(10) 


2-Phosphoglycerate 


— > Phosphoenolpyruvate 


(11) 


Phosphoenolpyruvate oADP 


— > Pyruvate o ATP 


(12) 


p-D-GlucoseoNADP + 


-> D-Glucono-l,5-Lactone-6PoNADPH 


(13) 


D-Glucono-l,5-Lactone-6P 


— > 6-Phospo-D-Gluconate 


(14) 


6-Phospo-D-Gluconate oNADP + 


-> Ribulose-5-PoNADPH 


(15) 


Ribulose-5-P 


-> D-Xylulose-5-P 


(16) 


Ribulose-5-P 


-> D-Ribose-5P 


(17) 


D-Ribose-5P o D-Xylulose-5P 


— > Glyceraldehyde-3-P o D-sedoeptulose-7-P 


(18) 


D-sedoeptulose-7-P o Glyceraldehyde-3-P 


— > D-Erythrose-4P o D-Fructose-6-P 


(19) 


D-Erythrose-4P o D-Xylulose-5-P 


Glyceraldehyde-3-P o j3 -D-Fructose-6P 



Table 2: Rules of the Glycolytic Pathway and of the Pentose Phosphate Pathway 



redundant /or ?/ze production of E in the modified system, considering p' above, O could be excluded, 
s/rcce throughout p' the modified system is still able to produce E. 

• Finally note that C (</>, ?r), (j3,y) C (0, tt) w/ii/e neither (A,ju) C (j8,y) nor (j3,y) C (A,/i). 

Indeed, the transitions related to the application of rules a«J (j8,y) (fi arcc? ?2 resp.) are not 

causally related, hence, they can be fired in any order. 



5 Properties at work in a metabolic pathway 

A precise characterization of the structural role played by the single elements in the overall metabolic 
networks is relevant both for better understanding living systems and for developing treatments for patho- 
logical aspects. As an example consider the clinical studies of primary and metastatic cancers that have 
clearly demonstrated that human malignancies are characterized by an increased activity of glycolysis 
when compared to normal tissue ifTTl . This metabolic peculiarity suggests an inviting target for can- 
cer treatment and various therapeutic strategies aiming at selectively disrupting glycolytic network of 
malignant cells are under investigation [18]. 

In this light, we present a simplified glycolytic pathway embedded in a wider context comprising 
also the Penthose Phosphate Pathway. Through these interconnected pathways the /3 -D-Glucose-6P is 
oxidized yielding Pyruvate and energy (ATP) or Ribose and reducing equivalents (NADPH). 

The pathway can be formalized as in Tab. [2] For lack of space we do not show here the corresponding 
LTS graph, however it should be clear how our properties, related with very important biological features, 
can be verified using the method illustrated in § [4](see in particular Ex. 4.8 and 4. 13 1. 



Reaction properties Suppose that our initial solution is S a : {p-D-Glucose, ATP, NADP + , NAD}; we 
can verify the following properties. 

• The existence of the following causal chains of rules: (1) C (2) C ... C (11) and (12) C ... C (15). 
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• Rule (7) is essential for the production of the metabolite Pyruvate. Its exclusion interrupts all the 
possible paths reaching Pyruvate. 

• Rule (12) is essential for the production of various metabolites, e.g. NADPH and D-Erythrose-4P. 

• The metabolite D-Glucono-l,5-Lactone-6P is a checkpoint for the production of NADPH, produced 
both by the rules (12) and (14), and for that of D-Xylulose-5-P, produced both by the rules (15) and (17). 
The rule (12) which produces D-Glucono-l,5-Lactone-6P corresponds to the reaction catalyzed by the 
enzyme Glucose-6-Phosphate Dehydrogenase(G6PD), an enzymopathy commonly known as fauvism. 

• The metabolite Glyceraldehyde-3-P can be produced either by the ^-path composed by the transitions 
corresponding to the rules: (12), (13), (14), (15), (16), (17), or by the #-path composed by the transitions 
corresponding to the rules: (1), (2), (3), (4). The two paths correspond to two vicarious explanations. 

• The metabolite NADP + should be included in the initial solution in order to produce D-Xylulose-5-P, 
but it can be excluded as far as the production of [i-D-Fructose-l,6bP is concerned. 

• Finally, having the initial solution Sp = S a U {Glyceraldehyde-3-P}, the rules (4) and (5) are mutually 
essential for Pyruvate as they must be both removed in order to suppress the Dihydroxyacetonephosphate 
production. Another example of mutually essential rules is given in ll5l . 

Network properties In order to illustrate our definition of robustness, we consider two pathways: the 
pathway described above and another one, obtained from the first one by suppressing the two reactions, 
one inverse of the other, represented by rules (5) and (6). This suppression corresponds to the inhibition 
of an enzyme, that is related to a severe disease, known as triosephosphate isomerase (TPI) deficiency, 
see lOTl for details. Considering the standard solution S a , it is easy to verify that the glycolytic path- 
way results to be more robust than its variant related to the disease with respect to the production of 
Pyruvate: only one of the explanations for Pyruvate existing in the original network is viable in the 
second one. This simple example well highlights the relevance that a study of robustness may have. 
Quite naturally, this notion can be extended in order to consider robustness with respect to different so- 
lutions or with respect to different metabolites of the same network. About the latter, intuitively, it turns 
out that there are at least two explanations for metabolites like Glyceraldehyde-3-P, 2-Phosphoglycerate 
and Pyruvate, given the solution S a in the glycolytic pathway. Instead, only one explanation exists for 
metabolites like p-D-Glucose-6P or. Therefore, the network results more robust for the production of 
Glyceraldehyde-3-P, 2-Phosphoglycerate and Pyruvate rather than for that of [i-D-Glucose-6P. From a 
drug research point of view, targeting the parts of the network involved in the production of the last two 
metabolites may result more effective than targeting the others. Indeed, a drug targeting the reaction of 
hexokinase, leading to the production of p-D-Glucose-6P is under development lfT8l . 

6 Conclusions 

We have presented a taxonomy of biological properties of interest regarding metabolic network. Based 
on a (formal) notion of causality, this taxonomy translates a bunch of properties in use within biologists 
into a formal framework. We have also proposed a computational counterpart of the framework, which, 
allowing the automated verification of the properties, paves the way to the development of software 
tools supporting the analysis of metabolic networks. We have chosen a reading of causality and the 
computational mechanisms that rely on theories developed in concurrency and particularly suitable to 
describe causality in interactive behaviours and providing a wealth of analysis techniques. Definitions 
do not depend on the computational framework, and this can be changed whenever another computational 
support may result more convenient for a specific domain or set of (causally-based) properties. 

Future work regards the extension of the set of proposed properties, experimentation with case- 
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studies of interest for wet-lab research, possibly contrasting the framework with other analogous propos- 
als especially as far as the trade-off between expressiveness and efficiency is concerned. Moreover, we 
would like to attempt a characterization of properties of sets of interconnected signaling pathways, like 
the ones involved in cancerogenesis, since the understanding of the structural features underlying their 
interactions may provide useful hints for drug research. In this sense, it could be worth studing possible 
integrations of our framework with the qualitative logical view adopted in [3] for signaling networks. 
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